12 research outputs found
3D Representation Learning for Shape Reconstruction and Understanding
The real world we are living in is inherently composed of multiple 3D objects. However, most of the existing works in computer vision traditionally either focus on images or videos where the 3D information inevitably gets lost due to the camera projection. Traditional methods typically rely on hand-crafted algorithms and features with many constraints and geometric priors to understand the real world. However, following the trend of deep learning, there has been an exponential growth in the number of research works based on deep neural networks to learn 3D representations for complex shapes and scenes, which lead to many cutting-edged applications in augmented reality (AR), virtual reality (VR) and robotics as one of the most important directions for computer vision and computer graphics.
This thesis aims to build an intelligent system with dynamic 3D representations that can change over time to understand and recover the real world with semantic, instance and geometric information and eventually bridge the gap between the real world and the digital world. As the first step towards the challenges, this thesis explores both explicit representations and implicit representations by explicitly addressing the existing open problems in these areas. This thesis starts from neural implicit representation learning on 3D scene representation learning and understanding and moves to a parametric model based explicit 3D reconstruction method. Extensive experimentation over various benchmarks on various domains demonstrates the superiority of our method against previous state-of-the-art approaches, enabling many applications in the real world. Based on the proposed methods and current observations of open problems, this thesis finally presents a comprehensive conclusion with potential future research directions
deformation in SCFTs and integrable supersymmetric theories
We calculate the -multiplets for two-dimensional Euclidean
and superconformal field theories
under the deformation at leading order of perturbation theory
in the deformation coupling. Then, from these deformed
multiplets, we calculate two- and three-point correlators. We show the
chiral ring's elements do not flow under the
deformation. For the case of , we show the
twisted chiral ring and chiral ring cease to exist simultaneously. Specializing
to integrable supersymmetric seed theories, such as
Landau-Ginzburg models, we use the thermodynamic Bethe ansatz to study the
S-matrices and ground state energies. From both an S-matrix perspective and
Melzer's folding prescription, we show that the deformed ground state energy
obeys the inviscid Burgers' equation. Finally, we show that several indices
independent of -term perturbations including the Witten index,
Cecotti-Fendley-Intriligator-Vafa index and elliptic genus do not flow under
the deformation.Comment: 46 page
SignAvatars: A Large-scale 3D Sign Language Holistic Motion Dataset and Benchmark
In this paper, we present SignAvatars, the first large-scale multi-prompt 3D
sign language (SL) motion dataset designed to bridge the communication gap for
hearing-impaired individuals. While there has been an exponentially growing
number of research regarding digital communication, the majority of existing
communication technologies primarily cater to spoken or written languages,
instead of SL, the essential communication method for hearing-impaired
communities. Existing SL datasets, dictionaries, and sign language production
(SLP) methods are typically limited to 2D as the annotating 3D models and
avatars for SL is usually an entirely manual and labor-intensive process
conducted by SL experts, often resulting in unnatural avatars. In response to
these challenges, we compile and curate the SignAvatars dataset, which
comprises 70,000 videos from 153 signers, totaling 8.34 million frames,
covering both isolated signs and continuous, co-articulated signs, with
multiple prompts including HamNoSys, spoken language, and words. To yield 3D
holistic annotations, including meshes and biomechanically-valid poses of body,
hands, and face, as well as 2D and 3D keypoints, we introduce an automated
annotation pipeline operating on our large corpus of SL videos. SignAvatars
facilitates various tasks such as 3D sign language recognition (SLR) and the
novel 3D SL production (SLP) from diverse inputs like text scripts, individual
words, and HamNoSys notation. Hence, to evaluate the potential of SignAvatars,
we further propose a unified benchmark of 3D SL holistic motion production. We
believe that this work is a significant step forward towards bringing the
digital world to the hearing-impaired communities. Our project page is at
https://signavatars.github.io/Comment: 9 pages; Project page available at https://signavatars.github.io
Decomposed Human Motion Prior for Video Pose Estimation via Adversarial Training
Estimating human pose from video is a task that receives considerable
attention due to its applicability in numerous 3D fields. The complexity of
prior knowledge of human body movements poses a challenge to neural network
models in the task of regressing keypoints. In this paper, we address this
problem by incorporating motion prior in an adversarial way. Different from
previous methods, we propose to decompose holistic motion prior to joint motion
prior, making it easier for neural networks to learn from prior knowledge
thereby boosting the performance on the task. We also utilize a novel
regularization loss to balance accuracy and smoothness introduced by motion
prior. Our method achieves 9\% lower PA-MPJPE and 29\% lower acceleration error
than previous methods tested on 3DPW. The estimator proves its robustness by
achieving impressive performance on in-the-wild dataset
U3DS: Unsupervised 3D Semantic Scene Segmentation
Contemporary point cloud segmentation approaches largely rely on richly
annotated 3D training data. However, it is both time-consuming and challenging
to obtain consistently accurate annotations for such 3D scene data. Moreover,
there is still a lack of investigation into fully unsupervised scene
segmentation for point clouds, especially for holistic 3D scenes. This paper
presents U3DS, as a step towards completely unsupervised point cloud
segmentation for any holistic 3D scenes. To achieve this, U3DS leverages a
generalized unsupervised segmentation method for both object and background
across both indoor and outdoor static 3D point clouds with no requirement for
model pre-training, by leveraging only the inherent information of the point
cloud to achieve full 3D scene segmentation. The initial step of our proposed
approach involves generating superpoints based on the geometric characteristics
of each scene. Subsequently, it undergoes a learning process through a spatial
clustering-based methodology, followed by iterative training using
pseudo-labels generated in accordance with the cluster centroids. Moreover, by
leveraging the invariance and equivariance of the volumetric representations,
we apply the geometric transformation on voxelized features to provide two sets
of descriptors for robust representation learning. Finally, our evaluation
provides state-of-the-art results on the ScanNet and SemanticKITTI, and
competitive results on the S3DIS, benchmark datasets.Comment: 10 Pages, 4 figures, accepted to IEEE/CVF Winter Conference on
Applications of Computer Vision (WACV) 202
The Hitchhiker's Guide to 4d Superconformal Field Theories
Superconformal field theory with supersymmetry in four
dimensional spacetime provides a prime playground to study strongly coupled
phenomena in quantum field theory. Its rigid structure ensures valuable
analytic control over non-perturbative effects, yet the theory is still
flexible enough to incorporate a large landscape of quantum systems. Here we
aim to offer a guidebook to fundamental features of the 4d
superconformal field theories and basic tools to construct them in
string/M-/F-theory. The content is based on a series of lectures at the Quantum
Field Theories and Geometry School
(https://sites.google.com/view/qftandgeometrysummerschool/home) in July 2020.Comment: v3: Improved discussion, fixed typos, added references v2: Typos
fixed and added references. v1: 96 pages. Based on a series of lectures at
the Quantum Field Theories and Geometry School in July 202
U3DS3 : Unsupervised 3D Semantic Scene Segmentation
Contemporary point cloud segmentation approaches largely rely on richly annotated 3D training data. However , it is both time-consuming and challenging to obtain consistently accurate annotations for such 3D scene data. Moreover, there is still a lack of investigation into fully un-supervised scene segmentation for point clouds, especially for holistic 3D scenes. This paper presents U3DS 3 , as a step towards completely unsupervised point cloud segmen-tation for any holistic 3D scenes. To achieve this, U3DS 3 leverages a generalized unsupervised segmentation method for both object and background across both indoor and outdoor static 3D point clouds with no requirement for model pre-training, by leveraging only the inherent information of the point cloud to achieve full 3D scene segmentation. The initial step of our proposed approach involves generating superpoints based on the geometric characteristics of each scene. Subsequently, it undergoes a learning process through a spatial clustering-based methodology, followed by iterative training using pseudo-labels generated in accordance with the cluster centroids. Moreover, by leverag-ing the invariance and equivariance of the volumetric representations , we apply the geometric transformation on vox-elized features to provide two sets of descriptors for robust representation learning. Finally, our evaluation provides state-of-the-art results on the ScanNet and SemanticKITTI, and competitive results on the S3DIS, benchmark datasets
P2-net: Joint description and detection of local features for pixel and point matching
Accurately describing and detecting 2D and 3D keypoints is crucial to
establishing correspondences across images and point clouds. Despite a plethora
of learning-based 2D or 3D local feature descriptors and detectors having been
proposed, the derivation of a shared descriptor and joint keypoint detector
that directly matches pixels and points remains under-explored by the
community. This work takes the initiative to establish fine-grained
correspondences between 2D images and 3D point clouds. In order to directly
match pixels and points, a dual fully convolutional framework is presented that
maps 2D and 3D inputs into a shared latent representation space to
simultaneously describe and detect keypoints. Furthermore, an ultra-wide
reception mechanism in combination with a novel loss function are designed to
mitigate the intrinsic information variations between pixel and point local
regions. Extensive experimental results demonstrate that our framework shows
competitive performance in fine-grained matching between images and point
clouds and achieves state-of-the-art results for the task of indoor visual
localization. Our source code will be available at [no-name-for-blind-review].Comment: ICCV 202